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ABSTRACT 

We show that various milestones of high-redshift galaxy formation, such as the formation of the first stars or 
the complete reionization of the intergalactic medium, occurred at different times in different regions of the 
universe. The predicted spread in redshift, caused by large-scale fluctuations in the number density of galaxies, 
is at least an order of magnitude larger than previous expectations that argued for a sharp end to reionization. 
This cosmic scatter in the abundance of galaxies introduces new features that affect the nature of reionization 
and the expectations for future probes of reionization, and may help explain the present properties of dwarf 
galaxies in different environments. The predictions can be tested by future numerical simulations and may 
be verified by upcoming observations. Current simulations, limited to relatively small volumes and periodic 
boundary conditions, largely omit cosmic scatter and its consequences. In particular, they artificially produce 
a sudden end to reionization, and they underestimate the number of galaxies by up to an order of magnitude at 
redshift 20. 

Subject headings: galaxies: high-redshift, cosmology: theory, galaxies: formation 



1. INTRODUCTION 

Recent observation s of the cosmic microwave background 
( Sper gel et alJ 120031) have confirmed the notion that the 
present large-scale structure in the universe originated from 
small-amplitude density fluctuations at early cosmic times. 
Due to the natural instability of gravity, regions that were 
denser than average collapsed and formed bound halos, first 
on small spatial scales and later on larger and larger scales. 
At each snapshot of this cosmic evolution, the abundance of 
collapsed halos, whose masses are dominated by cold dark 
matter, can be computed from the initial conditions using nu- 
merical simulati ons and can be understood using approximate 
analytic models llPress & SchechteJ[l974l iBond et all 1 1991 ). 
The common understanding of galaxy formation is based on 
the notion that the constituent stars formed out of the gas that 
cooled and subsequently condensed to high densities in the 
cores of some of these halos (White & Rees 1978). 

T he standard analytic model for the abun dance of ha- 
los (Pr ess & Schechte3ll974tlBond et alJI 19911) considers the 
small density fluctuations at some early, initial time, and at- 
tempts to predict the number of halos that will form at some 
later time corresponding to a redshift z- First, the fluctuations 
are extrapolated to the present time using the growth rate of 
linear fluctuations, and then the average density is computed 
in spheres of various sizes. Whenever the overdensity (i.e., 
the density perturbation in units of the cosmic mean density) 
in a sphere rises above a critical threshold S c (z), the corre- 
sponding region is assumed to have collapsed by redshift z, 
forming a halo out of all the mass that had been included in 
the initial spherical region. In analyzing the statistics of such 
regions, the model separates the contribution of large-scale 
modes from that of small-scale density fluctuations. It pre- 
dicts that galactic halos w ill form earlier in regions that are 
overdense on large scales l| Kgisgrjri984: Bardeen et all l 198(1 
ICole & KaiserlfT98^ iMo & Whitelll99fl) . since these regions 



already start out from an enhanced level of density, and small- 
scale modes need only supply the remaining perturbation nec- 
essary to reach S c (z). On the other hand, large-scale voids 
should contain a reduced number of halos at high redshift. In 
this way, the analytic model describes the clustering of mas- 
sive halos. 

As gas falls into a dark matter halo, it can fragment into 
stars only if its virial temperature is above 10 4 K for cool- 
ing mediated by atomic transitions [ or ~ 500 K for molecul ar 
H2 cooling; see, e.g., Figure 12 in Barkana & Loeb (2001)]. 
The abundance of dark matter halos with a virial tempera- 
ture above this cooling threshold declines sharply with in- 
creasing redshift due to the exponential cutoff in the abun- 
dance of massive halos at early cosmic times. Consequently, 
a small change in the collapse threshold of these rare halos, 
due to mild inhomogeneities on much larger spatial scales, 
can change the abundance of such halos dramatically. The 
modulation of galaxy formation by long wavelength modes 
of density fluctuations is therefore amplified considerably at 
high redshift. In this paper we show that this results in major 
new predictions for high-redshift observations. The implica- 
tions are particularly significant for cosmic reionization and 
all observational probes of this epoch. 

This paper is organized as follows. In § 2 we quantify the 
scatter in the statistics of galaxy formation produced by this 
amplification effect. We first explain in § 2.1 the basic physi- 
cal ideas and implications using the well-established extended 
Press-Schechter model. We then present in § 2.2 a simple 
idea that yields a much more accurate model that fits an ar- 
ray of previous simulations at low redshift. We demonstrate 
the qualitative correctness of our basic assumptions as well 
as the quantitative accuracy of our model by matching results 
from recent simulations at high redshift. Since high-redshift 
galaxies provide the UV photons that lead to the reionization 
of the intergalactic medium (hereafter IGM), a large scatter 



2 



is also expected in the reionization redshift within different 
regions in the universe. We consider this scatter and the mod- 
ified character of reionization in § 3.1, and show in § 3.2 that 
existing numerical simulations do not include fluctuations on 
sufficiently large scales at high redshift. In § 3.3 we discuss 
the observational implications of the large cosmic scatter ex- 
pected at high redshift. Finally, we summarize our main re- 
sults in § 4. 

2. HALO MASS FUNCTION IN DIFFERENT ENVIRONMENTS 

2.1. Basic Model: Amplification of Density Fluctuations 

Galaxies at high redshift are believed to form in all halos 
above some minimum mass M m m that depends on the effi- 
ciency of atomic and molecular transitions that cool the gas 
within each halo. This makes useful the standard quantity 
of the collapse fraction F co \(M m i n ), which is the fraction of 
mass in a given volume that is contained in halos of individ- 
ual mass M m j n or greater. If we set M m j n to be the minimum 
halo mass in which efficient cooling processes are triggered, 
then F C0 \(M m i n ) is the fraction of all the baryons in the uni- 
verse that lie in galaxies. In a large-scale region of comoving 
radius R with a m ean overdensity Sr, the standard result is 
JBond et all! 9911) 



^coi(M min ) = erfc 



5c(z)-S R 



(1) 



where S(R) is the variance of fluctuations in spheres of radius 
R, and S(R m i n ) is the variance in spheres of radius R m [ D cor- 
responding to the region at the initial time that contained a 
mass M m i n . In particular, the cosmic mean value of the col- 
lapse fraction is obtained in the limit of R — ► oo by setting 
5r and S(R) to zero in this expression. Throughout this sec- 
tion our results assume this standard model, known as the ex- 
tended Press-Schechter model, which we apply to a universe 
with cosmological parameters matching the l atest observa- 
tions [ specifically, the running index model of Snergel et al. 
(2003)]. Whenever we consider a cubic region, we estimate 
its halo abundance by applying the model to a spherical region 
of equal volume. Note also that we consistently quote values 
of comoving distance, which equals physical distance times a 
factor of (1 + z). 

Our results are based on a simple idea. At high redshift, 
galactic halos are rare and correspond to high peaks in the 
Gaussian probability distribution of initial fluctuations. A 
modest change in the overall density of a large region mod- 
ulates the threshold for high peaks in the Gaussian density 
field, so that the number of galaxies is exponentially sensitive 
to this modulation. This amplification of large-scale modes is 
responsible for the large statistical fluctuations that we find. 

In numerical simulations, periodic boundary conditions are 
usually assumed, and this forces the mean density of the box 
to equal the cosmic mean density. The abundance of halos as 
a function of mass is then biased in such a box (see Figure 1), 
since a similar region in the real universe will have a distri- 
bution of different overdensities Sr. At high redshift, when 
galaxies correspond to high peaks, they are mostly found in 
regions with an enhanced large-scale density. In a periodic 
box, therefore, the total number of galaxies is artificially re- 
duced, and the relative abundance of galactic halos with dif- 
ferent masses is artificially tilted in favor of lower-mass halos. 
We illustrate our results for two sets of parameters, one cor- 
responding to the first galaxies and early reionization (z = 20) 
and the other to the current horizon in observations of galaxies 
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FIG. 1. — Bias in the halo mass distribution in simulations. We show the 
amount of mass contained in all halos of individual mass M m j„ or greater, 
expressed as a fraction of the total mass in a given volume. This cumulative 
fraction F co i(M ra j n ) is shown as a function of the minimum halo mass M ra j n . 
We consider two cases of redshift and simulation box size, namely Z = 7, 
'box = 6 Mpc (upper curves), and z = 20, / DOX = 1 Mpc (lower curves). At 
each redshift, we compare the true average distribution in the universe (dotted 
curve) to the biased distribution (solid curve) that would bemeasured in a 
simulation box with periodic boundary conditions (for which 8r is artificially 
set to zero). 



and late reionization (z = 7). We consider a resolution equal to 
that of state-of-the-art cosmological simulations that include 
gravity and gas hydrodynamics. Specifically, we assume that 
the total number of dark matter particles in the simulation is 
N = 324 3 , and that the smallest halo th at can form a galaxy 
must b e resolved into 500 particles; Sprinael & Hernquist 
(2003) showed that this resolution is necessary in order to de- 
termine the star formation rate in an individual halo reliably to 
within a factor of two. Therefore, if we assume that halos that 
cool via molecular hydrogen must be resolved at z = 20 (so 
that M m ; n = 7 x 10 5 M Q ), and only those that cool via atomic 
transitions must be resolved at z = 7 (so that M m ; n = 10 8 M Q ), 
then the maximum box sizes that can currently be simulated 
are lb ox = 1 Mpc and /box = 6 Mpc at these two redshifts, re- 
spectively. 

At each redshift we only consider cubic boxes large enough 
so that the probability of forming a halo on the scale of the en- 
tire box is negligible. In this case, <5# is Gaussian distributed 
with zero mean and variance S(R), since the no-halo condi- 
tion \/S(R) <C 5 c (z) implies that at redshift z the perturbation 
on the scale R is still in the linear regime. We can then calcu- 
late the probability distribution of collapse fractions in a box 
of a given size (see Figure 2). This distribution corresponds 
to a real variation in the fraction of gas in galaxies within 
different regions of the universe at a given time. In a numer- 
ical simulation, the assumption of periodic boundary condi- 
tions eliminates the large-scale modes that cause this cosmic 
scatter. Note that Poisson fluctuations in the number of ha- 
los within the box would only add to the scatter, although the 
variations we have calculated are typically the dominant fac- 
tor. For instance, in our two standard examples, the mean 
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FIG. 2. — Probability distribution within a small volume of the total mass 
fraction in galactic halos. The normalized distribution of the logarithm of 
this fraction Fcol(^min) is shown for two cases: z = 7, /box = 6 Mpc, M m ; n = 
10 8 Mq (upper panel), and z = 20, /b ox = 1 Mpc, M min = 7 X 1O 5 M (bottom 
panel). In each case, the value in a periodic box (Sr = 0) is shown (central 
dashed vertical line), along with the value that would be expected given a 
plus 1 — a (right dashed line) or a minus 1 — a (left dashed line) fluctuation in 
the mean density of the box. Also shown in each case is the mean value of 
fcolCWmin) averaged over large cosmological volumes (solid vertical line). 
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FIG. 3. — Cosmic scatter and numerical bias, expressed as the change in 
redshift needed to get the correct cosmic mean of the collapse fraction. We 
show the l-cr scatter (about the biased value) in the redshift of reioniza- 
tion, or any other phenomenon that depends on the mass fraction in galax- 
ies (bottom panel), as well as the redshift bias [expressed as a fraction 
of (1+z)] in periodic simulation boxes (upper panel). The bias is shown 
for M mill = 7 X 1O 5 M0 (solid curve), M min = 1O 8 M (dashed curve), and 
^min = 3 X 10 i0 Mq (dotted curve). The bias is always negative, and we 
show its absolute value. When expressed as a shift in redshift, the scatter is 
independent of M mm . 



expected number of halos in the box is 3 at z = 20 and 900 
at z = 7, resulting in Poisson fluctuations of a factor of about 
2 and 1.03, respectively, compared to the clustering-induced 
scatter of a factor of about 16 and 2 in these two cases. 

Within the extended Press-Schechter model, both the nu- 
merical bias and the cosmic scatter can be simply described 
in terms of a shift in the redshift (see Figure 3). In general, 
a region of radius R with a mean overdensity 5 K will contain 
a different collapse fraction than the cosmic mean value at a 
given redshift z. However, at some wrong redshift z + Az this 
small region will contain the cosmic mean collapse fraction at 
Z. At high redshifts (z > 3), this shift in redshift can be easily 
derived from eq. 1 to be 



S R 

Az=/-(l+z)x 

00 



S(R) 

S(Rmin) 



(2) 



where So = S r (z)/(l+ z) is approximately constant at high red- 
shifts (Peebles 1980), and equals 1.28 for our assumed cosmo- 
logical parameters. Thus, in our two standard examples, the 
bias is -2.6 at z = 20 and -0.4 at z = 7, and the one-sided 1 — a 
scatter is 2.4 at z = 20 and 1 .2 at z = 7. 

2.2. Improved Model: Matching Numerical Simulations 

In this subsection we develop an improved model that fits 
the results of numerical simulations more accurately. The 
model constructs the halo mass distribution (or mass func- 
tion); cumulative quantities such as the collapse fraction or 
the total number of galaxies can then be determined from it 
via integration. We first define f(S c (z),S)dS to be the mass 
fraction contained at z within halos with mass in the range 



corresponding to S to S+dS. The halo abundance is then 
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dM 
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M 



dS 



dM 



f(Sc(z),S) 



(3) 



where dn is the comoving number density of halos with 
masses in the range M to M + dM. In the model of 
iPress & Schechterl (119741) . 



M(S c (z),S) = 



exp 



(4) 



where v = S c (z)/VS is the number of standard deviations that 
the critical collapse overdensity represents on the mass scale 
M corresponding to the variance S. 

However, the Press-Schechter mass function fits numerical 
simulations only roughly, and in particular it substantially un- 
derestimates the abundance of the rare halos that host galaxies 
at high redshift. The halo mass fu nction of ISheth & Tormenl 
( 1999, see also She th et aTll2001l) adds two free parameters 
that al low it to fit n umerical simulations much more accu- 
rately (Jenkins et al. 2001). We note that these simulations 
followed very large volumes at low redshift, so that cosmic 
scatter did not compromise their accuracy. The matching 
mass function is given by 



f S T(S c (z),S) = A'^^- 



1 + 



1 



{a'v 2 Y 



exp 



a v 



(5) 



with best-fit parameters ( Sheth & Tormen 2002) a' = 0.75 and 
q' = 0.3, and where normalization to unity is ensured by taking 
A 1 = 0.322. 
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In order to calculate cosmic scatter we must determine 
the biased halo mass function in a given volume at a given 
mean density. Within the extended Press-Schechter model 
llBond etalJll99l . the halo mass distribution in a region of 
comoving radius R with a mean overdensity 5r is given by 

/bias-ps(«z), 6k,R, S) = f PS (S c (z)-S R ,S-S(R)) . (6) 

The corresponding collapse fraction in this case is given 
simply by eq. 0. Despite the relatively low accuracy of 
the Press-Schechter mass function, the relative change is 
predicted rather accurately by the extended Press-Schechter 
model. In other words, the prediction for the halo mass 
function in a given volume compared to the cosmic mean 
mass function provides a good fit to numerical simula- 
tions over a wide range of parameters (iMo & White! 1 19961 
ICasas-Miranda et all2002tlSheth & Tormenl2002l). 

For our improved model we adopt a hybrid approach that 
combines various previous models with each applied where it 
has been found to closely match numerical simulations. We 
obtain the halo mass function within a restricted volume by 
starting with the Sheth-Tormen formula for the cosmic mean 
mass function, and then adjusting it with a relative correction 
based on the extended Press-Schechter model. In other words, 
we set 



fbiMz),d R ,R,S) = 

f ^n^v \M(6c(z)-S R ,S-S(R)) 
/st(<w),S) x 



M(Sc(z),S) 



(7) 



As noted, this model is based on fits to simulations at low 
redshifts, but we can check it at high redshifts as well. Fig- 
ure|4]shows the number of galactic hal os at z ~ 15- 30 in two 
numerical simulations run bv lYoshida etafl (120031 and our 
predictions given the cosmological input parameters assumed 
by each simulation. The close fit to the simulated data (with 
no additional free parameters) suggests that our hybrid model 
(solid lines) improves on the extended Press-Schechter model 
(dashed lines), and can be used to calculate accurately the cos- 
mic scatter in the number of galaxies at both high and low red- 
shifts. The simulated data significantly deviate from the ex- 
pected cosmic mean [eq. Q, shown by the dotted line], due 
to the artificial suppression of large-scale modes outside the 
simulated box. We note that Yoshi daet alJ (|2003) mentioned 
that the lack of large-scale modes might produce a systemat- 
ically low halo abundance, particularly in the RSI model, but 
they did not quantify this effect. 

As an additional example, we consider the highest- 
resolution first star simulation (Abel et al. 2002), which used 
/ box = 128 kpc and M min = 7 x 10 5 M Q . The first star forms 
within the simulated volume when the first halo of mass M mm 
or larger collapses within the box. To compare with the simu- 
lation, we predict the redshift at which the probability of find- 
ing at least one halo within the box equals 50%, accounting 
for Poisson fluctuations. We find that if the simulation formed 
a population of halos corresponding to the correct cosmic av- 
erage [as given by eq. Q], then the first star should have 
formed already at z = 24.0. The first star actually form ed in 
the simulation box only at z = 18.2 lAbel et alJ l2002). Us- 
ing eq. Q we can account for the loss of large-scale modes 
beyond the periodic box, and predict a first star at z = 17.8, 
a close match given the large Poisson fluctuations introduced 
by considering a single galaxy within the box. 

The artificial bias in periodic simulation boxes can also 
be seen in the results of extensive numerical convergence 
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FIG. 4. — Halo mass function at high redshift in a 1 Mpc box at the cosmic 
mean density. Our hybrid model prediction (solid lines) is compared with 
the number of halos above mass 7 X 10 5 M© measured in the simulations of 
Yoshidaet al. 1 2003, data points are taken from their Figure 5). The cos- 
mic mean of the halo mass function (dotted lines) deviates significantly from 
the simulated values, since the periodic boundary conditions within the finite 
simulation box artificially set the amplitude of large-scale modes to zero. Our 
hybrid model starts with the Sheth-Tormen mass function and applies a cor- 
rection based on the extended Press-Schechter model; in doing so, it provides 
a better fit to numerical simulations than the pure extended Press-Schechter 
model (dashed lines) used in the previous figures. We consi der two sets of 
cosmo logical parameters, the scale-invariant ACDM model of Yos hida et all 
(2003) (upper curves), and their running scalar index (RSI) model (lower 
curves). 



tests carried out by Springel & Hernquist (2003). They pre- 
sented a large array of numerical simulations of galaxy for- 
mation run in periodic boxes over a wide range of box size, 
mass resolution, and redshift. In particular, we can identify 
several pairs of simulations where the simulations in each 
pair have the same mass resolution but different box sizes; 
this allows us to separate the effect of large-scale numerical 
bias from the effect of having poorly-resolved individual ha- 
los. Sp ecificall y, their simulations Zl and R4 [see Table 1 in 
Springel & Hernquist ( 2003)] used the same particle mass but 
R4 had a box length larger by a factor of 3.375 . The sim- 
ulations Ql and D4 are similarly related, as are Q2 and D5. 
In each case, the smaller simulation substantially underesti- 
m ated the star formation rate a t high redshift [see Figure 10 
in Springel & Hernquist ( 2003)], Zl by a factor of 5 at z = 15 
compared to R4, Ql by a factor of 3 at z = 8 compared to D4, 
and Q2 by a factor of 3 at z = 10 compared to D5. 

We note that there have been previous attempts to develop 
a model for the halo mass function in different environments, 
so that the model would be consistent with the Sheth-Tormen 
mass function of eq. (0 which accurately fits the cosmic mean 
mass function measured in numerical simulations. In order 
to identify the specific requirements for such a consistency, 
we first consider the analogous case of the extended Press- 
Schechter model and its relation to the Press-Schechter for- 
mula for the cosmic mean mass function. The extended Press- 
Schechter model is consistent with the mean mass function 
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in the sense that /bias-ps evaluated for an infinite box (i.e., 
in the limit where Sr and S{R) both vanish) yields the Press- 
Schechter mass function: 

/trias-Ps(^(z),0,CX),5) = /ps(Wz) ! 5) . (8) 

This condition does not suffice, however, since there is an ad- 
ditional self-consistency test that any viable model must sat- 
isfy. Consider any fixed scale R. Suppose we consider a very 
large number N of spheres of radius R within the universe. 
The mean density <5« in each sphere is determined accord- 
ing to a probability distribution p(5r); we assume that R is 
large enough so that the probability of forming a halo out of 
all the mass on the scale R is negligible, and so the distri- 
bution is a Gaussian with zero mean and variance S(R) (see 
also §2.1). The number of galaxies in each sphere is given in 
the extended Press-Schechter model by eq. (|6j. As N — > oo, 
the halo mass function averaged over all these spheres must 
approach the cosmic mean value, and it must also approach 
the ensemble-averaged mass function, where the averaging is 
performed over the probability distribution of Sr. This yields 
the following self-consistency requirement, which is indeed 
satisfied by the extended Press-Schechter model: 

J /bias-ps(<5 c (z) ,S R ,R 7 S)p(S R )d5 R = 

/bias-PsOUzXO.OO.S). (9) 

Now we again consider attempts to construct an improved 
model that is consistent with the Sheth-Tormen mass func- 
tion. Such a model must satisfy eq. (|8) (except with /st on 
the right-hand side), and it must also satisfy eq. l|9} in order 
to be self-consistent. The latter equation must be satisfied 
separately for every scale R large enough to avoid collapsing 
[i.e., mat satisfies yS(R ) <C S c (z)]. Previous p roposed models 
ilSheth & TormerJl2002l iGottlober et al]l2003l) satisfied sim- 
ple consistency but not the self-consistency test. Our hybrid 
model of eq. ([7} satisfies both requirements (with respect to 
the Sheth-Tormen mass function), a result that follows imme- 
diately from the fact that the extended Press-Schechter model 
also satisfies both requirements (with respect to the Press- 
Schechter mass function). Thus, our hybrid model is the first 
self-consistent model that is also consistent with the Sheth- 
Tormen mass function, at least when fluctuations are consid- 
ered on large scales R for which Sr is Gaussian distributed. As 
demonstrated in this section, our model also matches results 
from a wide array of numerical si mulations JAbel et alJ2002l: 
Yoshida et al.ll2003|lMo & Whitdfl996t ICasas-Miranda et alJ 
2002H.Tenkins et all2001 ~ 

3. IMPLICATIONS 
3.1. The nature of reionization 

The photons of the cosmic microwave background have 
traveled to us mostly undisturbed after neutral atoms first 
formed in the universe at the cosmic recombination epoch. 
Radiation from the first generation of stars is thought to have 
reionized the hydrogen throughout the universe, transforming 
the IGM back into a hot and highly-ionized plasma. 

The popular view d eveloped in the l it erature 
JArons & Winged 119721 IFukugita & Kawasakil [l994[ 
Shapi ro et alJ 11994 lHaiman & Loebl Il997t IGnedini 120001 
Barkana & Loebl 120011) maintains that reionization ended 
with a fast, simultaneous, overlap stage throughout the 
universe. This view has been based on simple arguments and 
has been supported by numerical simulations with small box 



sizes. The underlying idea was that the ionized hydrogen 
(H II) regions of individual sources began to overlap when 
the typical size of each H II bubble became comparable to 
the distance between nearby sources. Since these two length 
scales were comparable at the critical moment, there is only a 
single timescale in the problem - given by the growth rate of 
each bubble - and it determines the transition time between 
the initial overlap of two or three nearby bubbles, to the 
final stage where dozens or hundreds of individual sources 
overlap and produce large ionized regions. Whenever two 
ionized bubbles were joined, each point inside their common 
boundary became exposed to ionizing photons from both 
sources, reducing the neutral hydrogen fraction and allowing 
ionizing photons to travel farther before being absorbed. 
Thus, the ionizing intensity inside H II regions rose rapidly, 
allowing those regions to expand into high-density gas that 
had previously recombined fast enough to remain neutral 
when the ionizing intensity had been low. Since each bubble 
coalescence accelerates the process, it has been thought that 
the overlap phase has the character of a phase transition and 
occur s rapidly. Inde ed, the best simulations of reionization to 
date JGnedinl20 00) found that the average mean free path of 
ionizing photons in the simulated volume rises by an order of 
magnitude over a redshift interval Az = 0.05 at z = 7. 

Our results substantially modify this commonly accepted 
picture for the development of reionization. Overlap is still 
expected to occur rapidly, but only in localized high-density 
regions, where the ionizing intensity and the mean free path 
rise rapidly even while other distant regions are still mostly 
neutral. In other words, the size of the bubble of an individual 
source is about the same in different regions (since most ha- 
los have masses just above M m [ n ), but the typical distance be- 
tween nearby sources varies widely across the universe. The 
strong clustering of ionizing sources on length scales as large 
as 30-100 Mpc introduces long timescales into the reioniza- 
tion phase transition. The sharpness of overlap is determined 
not by the growth rate of bubbles around individual sources, 
but by the ability of large groups of sources within overdense 
regions to deliver ionizing photons into large underdense re- 
gions. Simply put, the common view assumes that reioniza- 
tion occurred in patches a few Mpc in size, and that it ended 
nearly simultaneously in all of them. In reality, however, these 
two statements are contradictory. If the patches are a few Mpc 
in size, then there is a very large spread in their reionization 
redshifts. Conversely, if the spread is small, this implies (from 
Figure 3) that the patches must be far larger than is commonly 
assumed. 

Note that the recombination rate is higher in overdense re- 
gions because of their higher gas density. These regions still 
reionize first, though, despite the need to overcome the higher 
recombination rate, since the number of ionizing sources in 
these regions is increased even more strongly as a result of the 
dramatic amplification of large-scale modes discussed earlier. 

3.2. Limitations of current simulations 

The shortcomings of current simulations do not amount 
simply to a shift of ~ 10% in redshift and the elimination 
of scatter, for several reasons. First, the effect that we have 
identified can be expressed in terms of a shift in redshift only 
within the context of the extended Press-Schechter model, and 
only if the total mass fraction in galaxies is considered and not 
its distribution as a function of galaxy mass. The halo mass 
distribution should still have the wrong shape, resulting from 
the fact that Az in eq. 2 depends on M m ; n . Furthermore, in our 
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more accurate hybrid model (§ 2.2), the effect on the collapse 
fraction is no longer exactly equivalent to a shift in redshift. 
In any case, a self-contained numerical simulation cannot rely 
on approximate models and must directly evolve a very large 
volume 1 . 

The second reason that current simulations are limited 
is that at high redshift, when galaxies are still rare, the 
abundance of galaxies grows rapidly towards lower redshift. 
Therefore, a ~ 10% relative error in redshift implies that at 
any given redshift around z ~ 10-20, the simulation predicts 
a halo mass function that can be off by an order of magni- 
tude for halos that host galaxies (see Figures 1 and|4j. This 
large underestimate suggests that the first generation of galax- 
ies formed significantly earlier than indicated by recent simu- 
lations. This makes it easier to expl ain recent observati ons of 
the cosmic microwave background ( Spergel et alJl2003l) that 
suggest an early reionization at z ~ 15-20. 

The third reason for the failure of simulations arises from 
the large cosmic scatter. This scatter can fundamentally 
change the character of any observable process or feedback 
mechanism that depends on a radiation background. Simu- 
lations in periodic boxes eliminate any large-scale scatter by 
assuming that the simulated volume is surrounded by identi- 
cal periodic copies of itself. In the case of reionization, for 
instance, current simulations neglect the collective effects de- 
scribed above, whereby groups of sources in overdense re- 
gions may influence large surrounding underdense regions. In 
the case of the formation of the first stars due to molecular 
hydrogen cooling, the effect of the soft ultraviolet radiation 
from these stars, whi ch tends to dissociate the molecular hy- 
drogen around them ( Haiman et al.l l 19971 iRicotti et al] l2002t 
Oh & Haimarf2 003). must be reassessed with cosmic scatter 
included. 

3.3. Observational consequences 

The spatial fluctuations that we have calculated fundamen- 
tally affect current and future observations that probe reion- 
ization or the galaxy population at high redshift. For exam- 
ple, there are a large number of programs searching for galax- 
ies at the highest accessible redshi fts (6.5 and beyond) using 
their strong Lya emission JHu et alJ2002tlRhoads etail2 003 ; 
iMaier et all2003l iKodaira et alJ2003l) . These programs have 
previously been justified as a search for the reionization red- 
shift, since the intrinsic emission should be absorbed more 
strongly by the surrounding IGM if this medium is neutral. 
For any particular source, it will be hard to clearly recognize 
this enhanced absorption because of uncertainties regarding 
the properties of the source and its radiative and gravitational 
effect s on its surroundings (Barkan a & Loebl2003alb[ISantosl 
120031) . However, if the luminosity function of galaxies that 
emit Lya can be observed, then faint sources, which do not 
significantly affect their environment, should be very strongly 
absorbed in the era before reionization. Reionization can then 
be detected statistically throug h the sudden jump in the num - 
ber of faint sources (Haiman & Spaans 1999; Haiman 2002). 
Our results alter the expectation for such observations. In- 
deed, no sharp "reionization redshift" is expected. Instead, a 
Lya luminosity function assembled from a large area of the 
sky will average over the cosmic scatter of Az ~ 1-2 between 

' iCiardi. Stoehr. & White] 120031) simulated a 30 Mpc box, larger than pre- 
vious simulations, but only gravity was directly simulated, and the mass res- 
olution was three orders of magnitude lower than the minimum necessary to 
resolve most galactic halos at high redshift. 



different regions, resulting in a smooth evolution of the lumi- 
nosity function over this redshift range. In addition, such a 
survey may be biased to give a relatively high redshift, since 
only the most massive galaxies can be detected, and as we 
have shown, these galaxies will be concentrated in overdense 
regions that will also get reionized relatively early. 

The distribution of ionized patches during reionization will 
likely be probed by future observations, including small-scale 
anisotropies of the cosmic microwave back ground ph otons 
that are rescattered by the i onized patches (|Aghanim et alJ 
119961 iGruzinov & Hull 19981 ISantos et alJ 120031) . and obser- 
vations of 21 cm emission b y the spin-flip transition of the 
hydrogen in neutral regions llTozzi et alJ l2000t ICarilli et alJ 
120021 iFurlanetto et all2003l) . Previous analytical and numer- 
ical estimates of these signals have not included the collec- 
tive effects discussed above, in which rare groups of massive 
galaxies may reionize large surrounding areas. These photon 
transfers will likely smooth out the signal even on scales sig- 
nificantly larger than the typical size of an H II bubble due to 
an individual galaxy. Therefore, even the characteristic angu- 
lar scales that are expected to show correlations in such ob- 
servations must be reassessed. 

The cosmic scatter also affects observations in the present- 
day universe that depend on the history of reionization. For 
instance, photoionization heating suppresses the formation of 
dwarf galaxies after reionization, suggesting that the small- 
est galaxies see n toda y may have forme d prior to reion ization 
jBullocket alJl200ll lSomerviUdl200l iBenson et alJ 12002). 
Under the popular view that assumed a sharp end to reioniza- 
tion, it was expected that denser regions would have formed 
more galaxies by the time of reionization, possibly explain- 
ing the larger relative abundance of dwarf galaxies observed 
in galaxy clusters compared to lower-density regions such as 
our Lo cal Group of galaxies ( Tull v et aTll2002l IBenson et alJ 
l2003al) . Our results undercut the basic assumption of this ar- 
gument and suggest a different explanation altogether. Reion- 
ization occurs roughly when the number of ionizing photons 
produced starts to exceed the number of hydrogen atoms in 
the surrounding IGM. If the processes of star formation and 
the production of ionizing photons are equally efficient within 
galaxies that lie in different regions, then reionization in each 
region will occur when the collapse fraction reaches the same 
critical value, even though this will occur at different times in 
different regions. Since the galaxies responsible for reioniza- 
tion have the same masses as present-day dwarf galaxies, this 
estimate argues for a roughly equal abundance of dwarf galax- 
ies in all environments today. This simple picture is, however, 
modified by several additional effects. First, the recombina- 
tion rate is higher in overdense regions at any given time, as 
discussed above. Furthermore, reionization in such regions is 
accomplished at an earlier time when the recombination rate 
was higher even at the mean cosmic density; therefore, more 
ionizing photons must be produced in order to compensate 
for the enhanced recombination rate. These two effects com- 
bine to make overdense regions reionize at a higher value of 
F co \ than underdense regions. In addition, the overdense re- 
gions, which reionize first, subsequently send their extra ion- 
izing photons into the surrounding underdense regions, caus- 
ing the latter to reionize at an even lower F m i. Thus, a higher 
abundance of dwarf galaxies today is indeed expected in the 
overdense regions. 

The same basic effect may be even more critical for un- 
derstanding the properties of large-scale voids, 10-30 Mpc 
regions in the present-day universe with an average mass den- 
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sity that is well below the cosmic mean. In order to pre- 
dict their properties, the first step is to consider the abun- 
dance of dark matter halos within them. Numerical simu- 
lations show t hat voids contain a lower relative abundance 
of rare halos (ICen & Ostrikerl 1200(1 ISomerville et all 1200 U 
Mathis & Whit el2002HBenson et al.l2003bl) . as expected from 
the raising of the collapse threshold for halos within a void. 
On the other hand, simulations show that voids actually place 
a larger fraction of their dark matter content in dwarf halos 
of mass below 1O 1O M (iGottlober et al.ll2003l) . This can be 
understood within the extended Press-Schechter model. At 
the present time, a typical region in the universe fills halos of 
mass 10 12 M Q and higher with most of the dark matter, and 
very little is left over for isolated dwarf halos. Although a 
large number of dwarf halos may have formed at early times 
in such a region, the vast majority later merged with other ha- 
los, and by the present time they survive only as substructure 
inside much larger halos. In a void, on the other hand, large 
halos are rare even today, implying that most of the dwarf 
halos that formed early within a void can remain as isolated 
dwarf halos till the present. Thus, most isolated dwarf dark 
matter halos in t he present univ erse should be found within 
large-scale voids (Barka naj2003l) . 

However, voids are observed to be rather deficient in 
dwarf galaxies as we ll as in larger galaxies on the scale of 
the Milky Wav (e.g.. iKirshner et alJll98ll lEder et alJU989t 
Grogi n & Gellerl 1 1 9991 120001 El-Ad & Piranl 120001 iPeeblesI 
2001). A deficit of large galaxies is naturally expected, since 
the total mass density in the void is unusually low, and the 
fraction of this already low density that assembles in large ha- 
los is further reduced relative to higher-density regions. The 
absence of dwarf galaxies is harder to understand, given the 
higher relative abundance expected for their host dark mat- 
ter halos. The standard model for galaxy formation may be 
consistent with the observations if some of the dwarf halos 
are dark and do not host stars. Large numbers of dark dwarf 
halos may be produced by the effect of reionization in sup- 
pressing the infall of gas into these halos. Indeed, exactly 
the same factors considered above, in the discussion of dwarf 
galaxies in clusters compared to those in small groups, ap- 
ply also to voids. Thus, the voids should reionize last, but 
since they are most strongly affected by ionizing photons from 
their surroundings (which have a higher density than the voids 
themselves), the voids should reionize when the abundance of 
galaxies within them is relatively low. A quantitative analysis 
of how the reionization redshift varies with environment may 
help establish a common framework for explaining the ob- 
served properties of dwarf galaxies in environments ranging 
from clusters to voids. 

4. CONCLUSIONS 

We have shown that the important milestones of high- 
redshift galaxy formation, such as the formation of the first 
stars and the completion of reionization, occurred at signifi- 
cantly different times in different regions of the universe. This 
conclusion results from the fact that the temperature thresh- 
old, above which cooling and fragmentation of gas are possi- 
ble, selects out dark matter halos that become exceptionally 
rare at high redshifts. Consequently, density fluctuations on 
large scales modulate the threshold for the collapse of high 



density peaks on small scales in the exponential tail of the 
Gaussian random field of density fluctuations, and introduce 
a remarkably large scatter in the abundance of star-forming 
galaxies at early cosmic times. 

We have developed an improved method to calculate the 
cosmic scatter (see §2.2). This yields the first self-consistent 
analytic model that matches the halo mass function measured 
in various regions in numerical simulations that covered a 
wide range of the parameter space of region size, mean den- 
sity, and redshift. 

Since the characteristic distance between nearby sources of 
ionizing radiation varies widely across the universe, the over- 
lap of the H II regions produced by these individual sources 
in the IGM occurs at significantly different times in different 
cosmic environments. Quantitatively, we find that the spread 
in the redshift of reionization should be at least an order of 
magnitude larger than previous expectations that argued for a 
sharp end to reionization (see § 3.1). 

Current nu merical simulations that treat g r avity and hy- 
drody namics JGnedinl 120*001 lAbel et alJl2002l: lYoshida etafl 
2003) largely eliminate this real cosmic scatter, and are artifi- 
cially biased toward late galaxy formation since they exclude 
large-scale modes (see Figures 3 and 4). We find that galaxy 
formation within state-of-the-art simulations with 324 3 parti- 
cles is artificially biased to occur too late by a redshift interval 
Az ~ 0.5 at z = 7 and Az ~ 2.5 at z = 20. The box lengt h used 
in state-of-the-art si mulations of reionization ( Gnedin 2000; 
lYoshida etal] f2003) is 1 .5-2 orders of magnitude below the 
minimum size necessary to treat th e scatter reliably, and so 
alternative computational schemes (Barkan a & Loebl l2003c) 
must be implemented in order to quantify the implications 
of the large cosmic scatter on the reionization history. This 
scatter should affect the statistical fluctuations in the number 
and clustering properties of sources in surveys with a narrow 
field of view (such as the Hubble Deep Field), the luminosity 
function of Ly a-emitting galaxies around the reionization red- 
shift, the fluctuations in the 2 1 cm flux produced by the neutral 
IGM, the power spectrum of the secondary anisotropies in the 
cosmic microwave background, and the present abundance of 
dwarf galaxies in various environments (see § 3.3). Simula- 
tions limited to a small box may be able to study the scatter 
in the number density of galaxies by varying the mean den- 
sity of the box, but such simulations cannot probe the global 
structure of reionization since this would involve the radiative 
transfer of ionizing photons over distances larger than the box 
size. 
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